智能论文笔记

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Jia Lin Hai , Marek Petrik , Mohammad Ghavamzadeh , Reazul Russel

分类：机器学习 | 人工智能

2022-09-09

先前关于安全加强学习的工作（RL）研究了对动态（aleatory）随机性的风险规避，并隔离地模拟了不确定性（认知）。我们提出并分析一个新框架，以共同对有限马和折现的无限马MDP中的认知和差异不确定性相关的风险进行建模。我们称此框架结合了规避风险和软性的方法RASR。我们表明，当使用EVAR或熵风险定义风险规定时，可以使用具有时间依赖性风险水平的新的动态程序公式有效地计算RASR中的最佳策略。结果，即使是在无限 - 亨特折扣环境中，最佳的规避风险政策也是确定性但依赖时间的。我们还表明，具有平均后验过渡概率的特定RASR目标减少到规避风险的RL。我们的经验结果表明，我们的新算法始终减轻EVAR和其他标准风险措施衡量的不确定性。

translated by 谷歌翻译

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

Ren Yang , Radu Timofte , Jing Liu , Yi Xu , Xinjian Zhang , Minyi Zhao , Shuigeng Zhou , Kelvin C. K. Chan , Shangchen Zhou , Xiangyu Xu

分类：计算机视觉

2021-04-21

本文回顾了关于压缩视频质量增强质量的第一个NTIRE挑战，重点是拟议的方法和结果。在此挑战中，采用了新的大型不同视频（LDV）数据集。挑战有三个曲目。Track 1和2的目标是增强HEVC在固定QP上压缩的视频，而Track 3旨在增强X265压缩的视频，以固定的位速率压缩。此外，轨道1和3的质量提高了提高保真度（PSNR）的目标，以及提高感知质量的2个目标。这三个曲目完全吸引了482个注册。在测试阶段，分别提交了12个团队，8支球队和11支球队，分别提交了轨道1、2和3的最终结果。拟议的方法和解决方案衡量视频质量增强的最先进。挑战的首页：https：//github.com/renyang-home/ntire21_venh

translated by 谷歌翻译

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Benjamin Kiefer , Matej Kristan , Janez Perš , Lojze Žust , Fabio Poiesi , Fabio Augusto de Alcantara Andrade , Alexandre Bernardino , Matthew Dawkins , Jenni Raitoharju , Yitong Quan

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2022-11-24

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.

translated by 谷歌翻译

BiTr-Unet: a CNN-Transformer Combined Network for MRI Brain Tumor Segmentation

Qiran Jia , Hai Shu

分类：人工智能 | 计算机视觉 | 机器学习

2021-09-25

卷积神经网络（CNNS）在3D医学图像上自动分割器官或病变取得了显着的成功。最近，视觉变压器网络在2D图像分类任务中表现出卓越的性能。与CNN相比，变压器网络由于其自我关注算法而提取远程特征的吸引力。因此，我们提出了一种称为Bitr-UNET的CNN变压器组合模型，对多模态MRI扫描进行脑肿瘤分割的具体修饰。我们的Bitr-UNET在BRATS2021验证数据集中实现了良好的性能，中值骰子得分0.9335,0.9304和0.8899，以及整个肿瘤，肿瘤核心和增强肿瘤的中位Hausdorff距离2.8284,2.2361和1.4142。在BRATS2021测试数据集上，骰子评分的相应结果为0.9257,0.9350和0.8874，对于Hausdorff距离为3,2.2361和1.4142。该代码在https://github.com/justatinydot/bitr-unet上公开使用。

translated by 谷歌翻译

A Concept Knowledge Graph for User Next Intent Prediction at Alipay

Yacheng He , Qianghuai Jia , Lin Yuan , Ruopeng Li , Yixin Ou , Ningyu Zhang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-02

This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译

Goal-oriented Autonomous Driving

Yihan Hu , Jiazhi Yang , Li Chen , Keyu Li , Chonghao Sima , Xizhou Zhu , Siqi Chai , Senyao Du , Tianwei Lin , Wenhai Wang

分类：计算机视觉 | 机器人

2022-12-20

Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction and planning. As sensors and hardware get improved, there is trending popularity to devise a system that can perform a wide diversity of tasks to fulfill higher-level intelligence. Contemporary approaches resort to either deploying standalone models for individual tasks, or designing a multi-task paradigm with separate heads. These might suffer from accumulative error or negative transfer effect. Instead, we argue that a favorable algorithm framework should be devised and optimized in pursuit of the ultimate goal, i.e. planning of the self-driving-car. Oriented at this goal, we revisit the key components within perception and prediction. We analyze each module and prioritize the tasks hierarchically, such that all these tasks contribute to planning (the goal). To this end, we introduce Unified Autonomous Driving (UniAD), the first comprehensive framework up-to-date that incorporates full-stack driving tasks in one network. It is exquisitely devised to leverage advantages of each module, and provide complementary feature abstractions for agent interaction from a global perspective. Tasks are communicated with unified query design to facilitate each other toward planning. We instantiate UniAD on the challenging nuScenes benchmark. With extensive ablations, the effectiveness of using such a philosophy is proven to surpass previous state-of-the-arts by a large margin in all aspects. The full suite of codebase and models would be available to facilitate future research in the community.

translated by 谷歌翻译

SDM: Spatial Diffusion Model for Large Hole Image Inpainting

Wenbo Li , Xin Yu , Kun Zhou , Yibing Song , Zhe Lin , Jiaya Jia

分类：计算机视觉

2022-12-06

Generative adversarial networks (GANs) have made great success in image inpainting yet still have difficulties tackling large missing regions. In contrast, iterative algorithms, such as autoregressive and denoising diffusion models, have to be deployed with massive computing resources for decent effect. To overcome the respective limitations, we present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image, largely enhancing the inference efficiency. Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion. On multiple benchmarks, we achieve new state-of-the-art performance. Code is released at https://github.com/fenglinglwb/SDM.

translated by 谷歌翻译

DiffPose: Toward More Reliable 3D Pose Estimation

Jia Gong , Lin Geng Foo , Zhipeng Fan , Qiuhong Ke , Hossein Rahmani , Jun Liu

分类：计算机视觉

2022-11-30

Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimation as a reverse diffusion process. We incorporate novel designs into our DiffPose that facilitate the diffusion process for 3D pose estimation: a pose-specific initialization of pose uncertainty distributions, a Gaussian Mixture Model-based forward diffusion process, and a context-conditioned reverse diffusion process. Our proposed DiffPose significantly outperforms existing methods on the widely used pose estimation benchmarks Human3.6M and MPI-INF-3DHP.

translated by 谷歌翻译

Fine-Grained Entity Segmentation

Lu Qi , Jason Kuen , Weidong Guo , Tiancheng Shen , Jiuxiang Gu , Wenbo Li , Jiaya Jia , Zhe Lin , Ming-Hsuan Yang

分类：计算机视觉

2022-11-10

In dense image segmentation tasks (e.g., semantic, panoptic), existing methods can hardly generalize well to unseen image domains, predefined classes, and image resolution & quality variations. Motivated by these observations, we construct a large-scale entity segmentation dataset to explore fine-grained entity segmentation, with a strong focus on open-world and high-quality dense segmentation. The dataset contains images spanning diverse image domains and resolutions, along with high-quality mask annotations for training and testing. Given the high-quality and -resolution nature of the dataset, we propose CropFormer for high-quality segmentation, which can improve mask prediction using high-res image crops that provide more fine-grained image details than the full image. CropFormer is the first query-based Transformer architecture that can effectively ensemble mask predictions from multiple image crops, by learning queries that can associate the same entities across the full image and its crop. With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging fine-grained entity segmentation task. The dataset and code will be released at http://luqi.info/entityv2.github.io/.

translated by 谷歌翻译